1 research outputs found
Workload-sensitive approaches to improving graph data partitioning online
PhD ThesisMany modern applications, from social networks to network security tools, rely upon
the graph data model, using it as part of an offline analytics pipeline or, increasingly,
for storing and querying data online, e.g. in a graph database management system
(GDBMS). Unfortunately, effective horizontal scaling of this graph data reduces to
the NP-Hard problem of “k-way balanced graph partitioning”.
Owing to the problem’s importance, several practical approaches exist, producing quality graph partitionings. However, these existing systems are unsuitable for partitioning
online graphs, either introducing unnecessary network latency during query processing, being unable to efficiently adapt to changing data and query workloads, or both.
In this thesis we propose partitioning techniques which are efficient and sensitive to
given query workloads, suitable for application to online graphs and query
workloads.
To incrementally adapt partitionings in response to workload change, we propose
TAPER: a graph repartitioner. TAPER uses novel datastructures to compute the
probability of expensive inter -partition traversals (ipt) from each vertex, given the
current workload of path queries. Subsequently, it iteratively adjusts an initial partitioning by swapping selected vertices amongst partitions, heuristically maintaining low
ipt and high partition quality with respect to that workload. Iterations are inexpensive
thanks to time and space optimisations in the underlying datastructures.
To incrementally create partitionings in response to graph growth, we propose Loom:
a streaming graph partitioner. Loom uses another novel datastructure to detect common patterns of edge traversals when executing a given workload of pattern matching
queries. Subsequently, it employs a probabilistic graph isomorphism method to incrementally and efficiently compare sub-graphs in the stream of graph updates, to
these common patterns. Matches are assigned within individual partitions if possible,
thereby also reducing ipt and increasing partitioning quality w.r.t the given workload.
- i -
Both partitioner and repartitioner are extensively evaluated with real/synthetic graph
datasets and query workloads. The headline results include that TAPER can reduce
ipt by upto 80% over a naive existing partitioning and can maintain this reduction in
the event of workload change, through additional iterations. Meanwhile, Loom reduces
ipt by upto 40% over a state of the art streaming graph partitioner